Exploration of WA DOH School Exemption Data over 3 School Years by Andrew T. Bauman

Introduction

Recent news media coverage and my own personal interest inspired me to explore a school vaccination data set for WA state. WA DOH employees generously assembled 3 data sets for my project for school years 2011-2012, 2012-2013, and 2013-2014. The 2011-2012 set is the most comprehensive in terms of school covered and relevant fields. Each data set contains an observation about a school. The 3 data sets were bound into one data set, then explored.

The initial data and cleaning/assembly of the combined data frame can be found here: githubrepo

Additional cleaning and wrangling can be found here: githubrepo

An initial exploration and some additional corrections for student enrollment, is here: githubrepo

The current state is a rough approach and far from perfect. I plan to revisit this as I discover additional elements that need cleaning. I am currently working on a more readable and organized version of the data wrangling using piping, so that changes are easy to make and push to the .rds file called as data for this project.

There are some aspects of vaccination and the spread of disease through populations that are best left to subject matter experts. I consulted CDC publications throughout this analysis and I encourage the reader to do the same.

For example, what is the vaccinatation rate threshold that should not be exceeded to maintain herd immunity? This is complex question. A typical answer is 10%. There are assumptions made regarding mixing of populations and how fast a contagion spreads. This is even more complex in the context of school data where there is a wide range of populations and students may come from a large geographical area. The latter is particularly true of private schools where students may not necessarily live within a specific geographical region.

In regards to spread of disease, some disease are more contagios than others and have a range of opportunities for exposure. For example, while Hep B is fairly contagious (~ 100 X times HIV), since it is blood-borne, the opportunity for exposure is less than something like the common cold. Measles is particularly contagious with a high exposure opportunity and inforomation available from the CDC suggest that vaccination rates <= 95% pose a risk. Bear is mind that this is factoring in that not all those vaccinated will actually be immune, so even at 100% exemption rate, 100% immunity would not be expected.

For the purposes of this analysis, I set the outbreak risk thresholds at 90% for general vaccination rates and 95% for MMR. That translates to 5% exempt for MMR and 10% exempt general.

Alter Data

  • Select fields from WA DOH 2011 - 2012, 2012-2013, and 2013 - 2014 School Year data set.
  • Subset or reported only
  • Subset by enrollement 10 - 2500 students all school years
  • Subset by enrollment 10 - 2500 students most recent school year (2013)
  • Subset by enrollment 100 - 2500 students

Univariate Plots Section

Distribution of enrolled students

##  [1] "school_year"            "school_code"           
##  [3] "school_name"            "school_type"           
##  [5] "school_city"            "school_state"          
##  [7] "school_zip"             "school_county"         
##  [9] "school_district"        "district_code"         
## [11] "enrolled"               "total_exemptions"      
## [13] "medical_exempt"         "nonmedical_exempt"     
## [15] "exempt_MMR"             "percent_exempt"        
## [17] "percent_medical_exempt" "percent_nonmedexempt"  
## [19] "percent_exempt_MMR"
## 'data.frame':    6696 obs. of  19 variables:
##  $ school_year           : Factor w/ 3 levels "2011","2012",..: 2 2 2 2 2 2 3 3 3 3 ...
##  $ school_code           : chr  "3706" "2308" "2131" "2757" ...
##  $ school_name           : chr  "Rose Hill Junior High" "Kirkland Junior High" "Wapato Middle School" "Satus Elementary" ...
##  $ school_type           : Factor w/ 2 levels "not_Public","Public": 2 2 2 2 2 2 2 2 2 2 ...
##  $ school_city           : chr  "Redmond" "Kirkland" "Wapato" "Wapato" ...
##  $ school_state          : chr  "WA" "WA" "WA" "WA" ...
##  $ school_zip            : chr  "98052" "98033" "98951" "98951" ...
##  $ school_county         : chr  "KING" "KING" "YAKIMA" "YAKIMA" ...
##  $ school_district       : chr  "Lake Washington School District" "Lake Washington School District" "Wapato School District" "Wapato School District" ...
##  $ district_code         : num  17414 17414 39207 39207 39207 ...
##  $ enrolled              : num  5076 4157 3540 3540 3540 ...
##  $ total_exemptions      : int  121 171 8 8 8 8 7 7 7 7 ...
##  $ medical_exempt        : num  15 13 5 5 5 5 5 5 5 5 ...
##  $ nonmedical_exempt     : num  106 161 3 3 3 3 2 2 2 2 ...
##  $ exempt_MMR            : int  73 101 8 8 8 8 7 7 7 7 ...
##  $ percent_exempt        : num  2.38 4.11 0.23 0.23 0.23 0.23 0.2 0.2 0.2 0.2 ...
##  $ percent_medical_exempt: num  0.296 0.313 0.141 0.141 0.141 ...
##  $ percent_nonmedexempt  : num  2.0883 3.873 0.0847 0.0847 0.0847 ...
##  $ percent_exempt_MMR    : num  1.44 2.43 0.23 0.23 0.23 0.23 0.2 0.2 0.2 0.2 ...
##  school_year school_code        school_name            school_type  
##  2011:1934   Length:6696        Length:6696        not_Public: 968  
##  2012:2428   Class :character   Class :character   Public    :5728  
##  2013:2334   Mode  :character   Mode  :character                    
##                                                                     
##                                                                     
##                                                                     
##  school_city        school_state        school_zip       
##  Length:6696        Length:6696        Length:6696       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  school_county      school_district    district_code      enrolled   
##  Length:6696        Length:6696        Min.   : 1109   Min.   :   1  
##  Class :character   Class :character   1st Qu.:17001   1st Qu.: 212  
##  Mode  :character   Mode  :character   Median :20400   Median : 425  
##                                        Mean   :22236   Mean   : 474  
##                                        3rd Qu.:31016   3rd Qu.: 585  
##                                        Max.   :39209   Max.   :5076  
##  total_exemptions medical_exempt    nonmedical_exempt   exempt_MMR   
##  Min.   :  0.0    Min.   :  0.000   Min.   :  0.00    Min.   :  0.0  
##  1st Qu.:  7.0    1st Qu.:  0.000   1st Qu.:  6.00    1st Qu.:  4.0  
##  Median : 19.0    Median :  2.000   Median : 17.00    Median : 11.0  
##  Mean   : 25.5    Mean   :  3.534   Mean   : 22.28    Mean   : 14.8  
##  3rd Qu.: 35.0    3rd Qu.:  4.000   3rd Qu.: 31.00    3rd Qu.: 20.0  
##  Max.   :436.0    Max.   :214.000   Max.   :330.00    Max.   :269.0  
##  percent_exempt   percent_medical_exempt percent_nonmedexempt
##  Min.   :  0.00   Min.   : 0.0000        Min.   :  0.000     
##  1st Qu.:  2.81   1st Qu.: 0.0000        1st Qu.:  2.373     
##  Median :  5.01   Median : 0.3883        Median :  4.357     
##  Mean   :  6.67   Mean   : 0.7957        Mean   :  5.977     
##  3rd Qu.:  7.97   3rd Qu.: 0.9009        3rd Qu.:  7.112     
##  Max.   :100.00   Max.   :33.3333        Max.   :100.000     
##  percent_exempt_MMR
##  Min.   :  0.000   
##  1st Qu.:  1.470   
##  Median :  2.840   
##  Mean   :  4.208   
##  3rd Qu.:  4.680   
##  Max.   :100.000

This is a long tailed distribution, a feature exhibited by most distributions in this data set. Log transformations are performed to visualize the data.

  • Range: 1 - 5076
  • Most of the data falls between ~ 100 - 1000 students per school

Distribution of total exemptions

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     7.0    19.0    25.5    35.0   436.0
## vaccReported$school_year: 2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   10.00   23.00   28.71   38.00  436.00 
## -------------------------------------------------------- 
## vaccReported$school_year: 2012
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    6.00   18.00   23.84   33.00  371.00 
## -------------------------------------------------------- 
## vaccReported$school_year: 2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    7.00   18.00   24.56   33.00  357.00
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.81    5.01    6.67    7.97  100.00
## vaccReported$school_year: 2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.182   5.320   6.761   8.198  60.000 
## -------------------------------------------------------- 
## vaccReported$school_year: 2012
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.690   4.855   6.642   7.912 100.000 
## -------------------------------------------------------- 
## vaccReported$school_year: 2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.700   4.875   6.623   7.920 100.000

  • Range: 0 - 436
  • Rate: 0 - 100%, ~ 6.6% mean for all 3 school years
  • The highest density data is from ~ 10 - 75 students per school
  • Year to year distributions are similiar for total and rate
  • most schools fall into a rate range of 2-25%
  • King County has the highest proportion of exemptions
  • There appears to be some smaller counties (i.e. Ferry, San Juan) and others with more density at higher exemption rates than King and Pierce county (most populous counties in WA)

Note: Higher exemption rates of ~ > 50% appear to be associated with lower levels of enrollment (< 100 students). This will be investigated in the bivariate section.

Medical Exemptions

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   2.000   3.534   4.000 214.000
## vaccReported$school_year: 2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   2.937   3.000 168.000 
## -------------------------------------------------------- 
## vaccReported$school_year: 2012
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   2.000   3.014   4.000 186.000 
## -------------------------------------------------------- 
## vaccReported$school_year: 2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   2.000   4.569   5.000 214.000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.3883  0.7957  0.9009 33.3300
## vaccReported$school_year: 2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.3140  0.5583  0.6891 17.6800 
## -------------------------------------------------------- 
## vaccReported$school_year: 2012
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.3711  0.7112  0.8777 33.3300 
## -------------------------------------------------------- 
## vaccReported$school_year: 2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.5128  1.0800  1.0980 29.6300

  • Range of 0-214 exemptions
  • Substantial rate increase from 2012 -> 2013
  • Most schools have 1-10 students with medical exemptions
  • The rate range is ` 0 -33% with most schols falling between ~0.1 and 5%
  • The mean rate rose from .55% to 1.08% from 2011 -> 2013
  • Overall distribution of rate shifts towards higher rates from 2011 -> 2013
  • King County
  • highest proportion
  • substantial shift to higer rates from 2011 -> 2013
  • there is a distinct spike at > 10%
  • Columbia county has not reported any medical exemptions for any year

Non-medical Exemptions

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    6.00   17.00   22.28   31.00  330.00
## vaccReported$school_year: 2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.00   20.00   25.77   35.00  327.00 
## -------------------------------------------------------- 
## vaccReported$school_year: 2012
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    5.00   15.00   21.18   29.00  330.00 
## -------------------------------------------------------- 
## vaccReported$school_year: 2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    6.00   15.00   20.55   28.00  313.00
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.373   4.357   5.977   7.112 100.000
## vaccReported$school_year: 2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.828   4.785   6.202   7.524  60.000 
## -------------------------------------------------------- 
## vaccReported$school_year: 2012
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.272   4.324   6.080   7.147 100.000 
## -------------------------------------------------------- 
## vaccReported$school_year: 2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.212   4.037   5.684   6.629 100.000

  • Range 0 - 330 with most schools haveing ~8-75 nonmedical exempt students
  • Mean fell by ~ 5 students per school 2011 -> 2013
  • Rate of 0 -100% with most schools at ~ 1 - 20% nonmedical exempt
  • No substantial change in mean rate (~6%) from 2011 -> 2013
  • King County has the largest proportion of nonmedical exempt
  • Ferry and San Juan counties distributiion is at a higher % than King County

MMR Exemptions

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     4.0    11.0    14.8    20.0   269.0
## vaccReported$school_year: 2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    5.00   12.00   15.29   21.00  229.00 
## -------------------------------------------------------- 
## vaccReported$school_year: 2012
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    4.00   11.00   14.58   20.00  267.00 
## -------------------------------------------------------- 
## vaccReported$school_year: 2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    4.00   11.00   14.63   20.00  269.00
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.470   2.840   4.208   4.680 100.000
## vaccReported$school_year: 2011
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.440   2.720   3.840   4.547  54.950 
## -------------------------------------------------------- 
## vaccReported$school_year: 2012
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.518   2.920   4.509   4.812 100.000 
## -------------------------------------------------------- 
## vaccReported$school_year: 2013
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.480   2.845   4.200   4.640 100.000

  • Range 0-269 with most schools haveing 3 - 40 MMR exempt students
  • Rate 0 -100% with most schools falling between ~0.5 - 20%
  • No substntial change in total mean or mean rate from 2011 -> 2013
  • King County has the largest proportion of nonmedical exempt
  • Ferry and San Juan counties distributiion is at a higher % than King County

Explore Exemption Types

Gather terms for exemption type (melt) the dataframe, the calculate the % of total exemptions that are of a particular type (medical or non-medical) While this is not a tidy data frame it is a convenient way to look at observations by exemption type. I am using this to develop a better understanding of the propotion of exemptions that are non-medical.

## 'data.frame':    13160 obs. of  19 variables:
##  $ school_year           : Factor w/ 3 levels "2011","2012",..: 2 2 2 3 2 2 1 2 1 2 ...
##  $ school_code           : chr  "1456" "800R" "3808" "3808" ...
##  $ school_name           : chr  "Tacoma Waldorf School" "Rising Tide School" "Waldron Island School" "Waldron Island School" ...
##  $ school_type           : Factor w/ 2 levels "not_Public","Public": 1 1 2 2 2 2 2 1 1 1 ...
##  $ school_city           : chr  "Tacoma" "Olympia" "WALDRON ISLAND" "Waldron Island" ...
##  $ school_state          : chr  "WA" "WA" "WA" "WA" ...
##  $ school_zip            : chr  "98405" "98501" "98297" "98297" ...
##  $ school_county         : chr  "PIERCE" "THURSTON" "SAN JUAN" "SAN JUAN" ...
##  $ school_district       : chr  "Tacoma School District" "Olympia School District" "Orcas Island School District" "Orcas Island School District" ...
##  $ district_code         : num  27010 34111 28137 28137 2250 ...
##  $ enrolled              : num  24 14 12 14 10 82 25 100 91 22 ...
##  $ total_exemptions      : int  20 11 9 10 7 50 15 55 50 12 ...
##  $ exempt_MMR            : int  20 11 8 9 7 27 11 55 50 8 ...
##  $ percent_exempt        : num  83.3 78.6 75 71.4 70 ...
##  $ percent_medical_exempt: num  0 0 0 0 10 ...
##  $ percent_nonmedexempt  : num  91.7 78.6 75 71.4 60 ...
##  $ percent_exempt_MMR    : num  83.3 78.6 66.7 64.3 70 ...
##  $ exemption_type        : Factor w/ 2 levels "medical_exempt",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ exemptions            : num  0 0 0 0 1 2 0 0 0 0 ...
##  [1] "school_year"            "school_code"           
##  [3] "school_name"            "school_type"           
##  [5] "school_city"            "school_state"          
##  [7] "school_zip"             "school_county"         
##  [9] "school_district"        "district_code"         
## [11] "enrolled"               "total_exemptions"      
## [13] "exempt_MMR"             "percent_exempt"        
## [15] "percent_medical_exempt" "percent_nonmedexempt"  
## [17] "percent_exempt_MMR"     "exemption_type"        
## [19] "exemptions"
##  school_year school_code        school_name            school_type   
##  2011:3834   Length:13160       Length:13160       not_Public: 1822  
##  2012:4740   Class :character   Class :character   Public    :11338  
##  2013:4586   Mode  :character   Mode  :character                     
##                                                                      
##                                                                      
##                                                                      
##  school_city        school_state        school_zip       
##  Length:13160       Length:13160       Length:13160      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  school_county      school_district    district_code      enrolled     
##  Length:13160       Length:13160       Min.   : 1109   Min.   :  10.0  
##  Class :character   Class :character   1st Qu.:17001   1st Qu.: 222.8  
##  Mode  :character   Mode  :character   Median :20094   Median : 427.0  
##                                        Mean   :22202   Mean   : 471.2  
##                                        3rd Qu.:31016   3rd Qu.: 586.0  
##                                        Max.   :39209   Max.   :2480.0  
##  total_exemptions   exempt_MMR     percent_exempt   percent_medical_exempt
##  Min.   :  0.00   Min.   :  0.00   Min.   : 0.000   Min.   : 0.0000       
##  1st Qu.:  8.00   1st Qu.:  5.00   1st Qu.: 2.897   1st Qu.: 0.0000       
##  Median : 20.00   Median : 11.00   Median : 5.050   Median : 0.4075       
##  Mean   : 25.82   Mean   : 14.98   Mean   : 6.682   Mean   : 0.8072       
##  3rd Qu.: 35.00   3rd Qu.: 20.00   3rd Qu.: 7.980   3rd Qu.: 0.9117       
##  Max.   :436.00   Max.   :269.00   Max.   :83.330   Max.   :33.3333       
##  percent_nonmedexempt percent_exempt_MMR           exemption_type
##  Min.   : 0.000       Min.   :  0.000    medical_exempt   :6580  
##  1st Qu.: 2.447       1st Qu.:  1.540    nonmedical_exempt:6580  
##  Median : 4.396       Median :  2.870                            
##  Mean   : 5.979       Mean   :  4.206                            
##  3rd Qu.: 7.121       3rd Qu.:  4.690                            
##  Max.   :91.667       Max.   :100.000                            
##    exemptions    
##  Min.   :  0.00  
##  1st Qu.:  1.00  
##  Median :  5.00  
##  Mean   : 13.07  
##  3rd Qu.: 18.00  
##  Max.   :330.00
## 'data.frame':    13160 obs. of  20 variables:
##  $ school_year           : Factor w/ 3 levels "2011","2012",..: 2 2 2 3 2 2 1 2 1 2 ...
##  $ school_code           : chr  "1456" "800R" "3808" "3808" ...
##  $ school_name           : chr  "Tacoma Waldorf School" "Rising Tide School" "Waldron Island School" "Waldron Island School" ...
##  $ school_type           : Factor w/ 2 levels "not_Public","Public": 1 1 2 2 2 2 2 1 1 1 ...
##  $ school_city           : chr  "Tacoma" "Olympia" "WALDRON ISLAND" "Waldron Island" ...
##  $ school_state          : chr  "WA" "WA" "WA" "WA" ...
##  $ school_zip            : chr  "98405" "98501" "98297" "98297" ...
##  $ school_county         : chr  "PIERCE" "THURSTON" "SAN JUAN" "SAN JUAN" ...
##  $ school_district       : chr  "Tacoma School District" "Olympia School District" "Orcas Island School District" "Orcas Island School District" ...
##  $ district_code         : num  27010 34111 28137 28137 2250 ...
##  $ enrolled              : num  24 14 12 14 10 82 25 100 91 22 ...
##  $ total_exemptions      : int  20 11 9 10 7 50 15 55 50 12 ...
##  $ exempt_MMR            : int  20 11 8 9 7 27 11 55 50 8 ...
##  $ percent_exempt        : num  83.3 78.6 75 71.4 70 ...
##  $ percent_medical_exempt: num  0 0 0 0 10 ...
##  $ percent_nonmedexempt  : num  91.7 78.6 75 71.4 60 ...
##  $ percent_exempt_MMR    : num  83.3 78.6 66.7 64.3 70 ...
##  $ exemption_type        : Factor w/ 2 levels "medical_exempt",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ exemptions            : num  0 0 0 0 1 2 0 0 0 0 ...
##  $ percent_exemption_type: num  0 0 0 0 14.3 ...
##  [1] "school_year"            "school_code"           
##  [3] "school_name"            "school_type"           
##  [5] "school_city"            "school_state"          
##  [7] "school_zip"             "school_county"         
##  [9] "school_district"        "district_code"         
## [11] "enrolled"               "total_exemptions"      
## [13] "exempt_MMR"             "percent_exempt"        
## [15] "percent_medical_exempt" "percent_nonmedexempt"  
## [17] "percent_exempt_MMR"     "exemption_type"        
## [19] "exemptions"             "percent_exemption_type"
##  school_year school_code        school_name            school_type   
##  2011:3834   Length:13160       Length:13160       not_Public: 1822  
##  2012:4740   Class :character   Class :character   Public    :11338  
##  2013:4586   Mode  :character   Mode  :character                     
##                                                                      
##                                                                      
##                                                                      
##                                                                      
##  school_city        school_state        school_zip       
##  Length:13160       Length:13160       Length:13160      
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  school_county      school_district    district_code      enrolled     
##  Length:13160       Length:13160       Min.   : 1109   Min.   :  10.0  
##  Class :character   Class :character   1st Qu.:17001   1st Qu.: 222.8  
##  Mode  :character   Mode  :character   Median :20094   Median : 427.0  
##                                        Mean   :22202   Mean   : 471.2  
##                                        3rd Qu.:31016   3rd Qu.: 586.0  
##                                        Max.   :39209   Max.   :2480.0  
##                                                                        
##  total_exemptions   exempt_MMR     percent_exempt   percent_medical_exempt
##  Min.   :  0.00   Min.   :  0.00   Min.   : 0.000   Min.   : 0.0000       
##  1st Qu.:  8.00   1st Qu.:  5.00   1st Qu.: 2.897   1st Qu.: 0.0000       
##  Median : 20.00   Median : 11.00   Median : 5.050   Median : 0.4075       
##  Mean   : 25.82   Mean   : 14.98   Mean   : 6.682   Mean   : 0.8072       
##  3rd Qu.: 35.00   3rd Qu.: 20.00   3rd Qu.: 7.980   3rd Qu.: 0.9117       
##  Max.   :436.00   Max.   :269.00   Max.   :83.330   Max.   :33.3333       
##                                                                           
##  percent_nonmedexempt percent_exempt_MMR           exemption_type
##  Min.   : 0.000       Min.   :  0.000    medical_exempt   :6580  
##  1st Qu.: 2.447       1st Qu.:  1.540    nonmedical_exempt:6580  
##  Median : 4.396       Median :  2.870                            
##  Mean   : 5.979       Mean   :  4.206                            
##  3rd Qu.: 7.121       3rd Qu.:  4.690                            
##  Max.   :91.667       Max.   :100.000                            
##                                                                  
##    exemptions     percent_exemption_type
##  Min.   :  0.00   Min.   :  0           
##  1st Qu.:  1.00   1st Qu.:  8           
##  Median :  5.00   Median : 50           
##  Mean   : 13.07   Mean   :Inf           
##  3rd Qu.: 18.00   3rd Qu.: 92           
##  Max.   :330.00   Max.   :Inf           
##                   NA's   :496

Public and private schools have similiar distributions and rates of both types of exemptions, for all three school years. Public schools have a larger proportion of exempitons of both types.

Univariate Analysis

What is the structure of your dataset?

My data set contains 6696 observations of 19 variables. There are 2 categorical variables: school_year: 3 levels [2011, 2012, 2013] school_type: 2 levels [public, not public]

What is/are the main feature(s) of interest in your dataset?

Number of exemptions per school: total, by type (medical and non-medical), and MMR.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

School enrollment and school type.

Did you create any new variables from existing variables in the dataset?

I created % of each total and each exemption using the number respective to the exemtion and enrollment, for each schools.

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

I was suprised to see that two of the smaller counties, Ferry and San Juan had distributions shifted to higher percent of all exemptions than King County. The narrative in the Seattle media led me to believe that the highest exemption rates would be observed in King County and the Seattle area in general. I was also surprised to find the public and private schools have similiar distribuions, albeit different proportions, respective to exemption type. This is the case on a proportion basis, but not for rate. However, the schools in these counties tend to be much smaller so the rates shift dramatically per exempt student. There was substantial tidying of the orignal data sets as per the github links in the introduction. I extracted a subset of the data consisting of only schools that reported exemptions for the three schools years. This elminated ~ 1200 observations but was necessary since non-reporting schools are missing all exemption values and typcially enrollment as well.

Bivariate Plots Section

Group by County and School Type

Group observations by county, and by county & school type (public, not public) for the most recent school year, 2013-2014. A data subset was used that includes only schools with enrollment of 10 - 2500 students.

##  [1] "school_county"                 "mean_percent_medical_exempt"  
##  [3] "median_percent_medical_exempt" "max_percent_medical_exempt"   
##  [5] "min_percent_medical_exempt"    "mean_percent_exempt_MMR"      
##  [7] "median_percent_exempt_MMR"     "max_percent_exempt_MMR"       
##  [9] "min_percent_exempt_MMR"        "mean_percent_exempt"          
## [11] "median_percent_exempt"         "max_percent_exempt"           
## [13] "min_percent_exempt"            "mean_percent_nonmedexempt"    
## [15] "median_percent_nonmedexempt"   "max_percent_nonmedexempt"     
## [17] "min_percent_nonmedexempt"      "schools"                      
## [19] "students"
##  school_county      mean_percent_medical_exempt
##  Length:39          Min.   :0.0000             
##  Class :character   1st Qu.:0.3250             
##  Mode  :character   Median :0.5900             
##                     Mean   :0.8056             
##                     3rd Qu.:1.1700             
##                     Max.   :1.9900             
##  median_percent_medical_exempt max_percent_medical_exempt
##  Min.   :0.0000                Min.   : 0.000            
##  1st Qu.:0.0000                1st Qu.: 1.425            
##  Median :0.4000                Median : 4.000            
##  Mean   :0.3887                Mean   : 6.151            
##  3rd Qu.:0.5750                3rd Qu.: 8.040            
##  Max.   :2.1100                Max.   :29.630            
##  min_percent_medical_exempt mean_percent_exempt_MMR
##  Min.   :0                  Min.   : 0.880         
##  1st Qu.:0                  1st Qu.: 3.025         
##  Median :0                  Median : 4.420         
##  Mean   :0                  Mean   : 5.170         
##  3rd Qu.:0                  3rd Qu.: 6.360         
##  Max.   :0                  Max.   :14.680         
##  median_percent_exempt_MMR max_percent_exempt_MMR min_percent_exempt_MMR
##  Min.   : 0.730            Min.   : 0.91          Min.   :0.0000        
##  1st Qu.: 2.285            1st Qu.:10.42          1st Qu.:0.0000        
##  Median : 3.300            Median :22.81          Median :0.0000        
##  Mean   : 3.924            Mean   :23.12          Mean   :0.6167        
##  3rd Qu.: 4.770            3rd Qu.:33.33          3rd Qu.:0.4000        
##  Max.   :11.170            Max.   :64.29          Max.   :7.1400        
##  mean_percent_exempt median_percent_exempt max_percent_exempt
##  Min.   : 1.320      Min.   : 1.060        Min.   : 2.13     
##  1st Qu.: 4.390      1st Qu.: 3.570        1st Qu.:15.02     
##  Median : 7.120      Median : 5.610        Median :30.77     
##  Mean   : 7.723      Mean   : 6.429        Mean   :28.47     
##  3rd Qu.: 8.695      3rd Qu.: 7.205        3rd Qu.:42.28     
##  Max.   :22.830      Max.   :23.080        Max.   :71.43     
##  min_percent_exempt mean_percent_nonmedexempt median_percent_nonmedexempt
##  Min.   : 0.000     Min.   : 1.040            Min.   : 0.670             
##  1st Qu.: 0.000     1st Qu.: 3.860            1st Qu.: 2.970             
##  Median : 0.000     Median : 6.440            Median : 5.020             
##  Mean   : 1.415     Mean   : 7.056            Mean   : 5.869             
##  3rd Qu.: 1.315     3rd Qu.: 7.730            3rd Qu.: 6.720             
##  Max.   :14.000     Max.   :22.180            Max.   :23.080             
##  max_percent_nonmedexempt min_percent_nonmedexempt    schools      
##  Min.   : 2.13            Min.   : 0.000           Min.   :  1.00  
##  1st Qu.:13.77            1st Qu.: 0.000           1st Qu.: 13.00  
##  Median :26.56            Median : 0.000           Median : 24.00  
##  Mean   :26.81            Mean   : 1.246           Mean   : 58.79  
##  3rd Qu.:37.98            3rd Qu.: 1.170           3rd Qu.: 53.50  
##  Max.   :71.43            Max.   :12.000           Max.   :581.00  
##     students     
##  Min.   :    28  
##  1st Qu.:  2648  
##  Median :  8244  
##  Mean   : 27302  
##  3rd Qu.: 23350  
##  Max.   :289034
## Classes 'tbl_df', 'tbl' and 'data.frame':    39 obs. of  19 variables:
##  $ school_county                : chr  "KING" "PIERCE" "SNOHOMISH" "CLARK" ...
##  $ mean_percent_medical_exempt  : num  1.99 0.72 1.13 0.59 0.87 0.27 0.96 1.15 0.5 1.2 ...
##  $ median_percent_medical_exempt: num  0.82 0.52 0.68 0.5 0.59 0.16 0.73 0.55 0.21 0.83 ...
##  $ max_percent_medical_exempt   : num  29.63 7.46 12.26 2.3 14.29 ...
##  $ min_percent_medical_exempt   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ mean_percent_exempt_MMR      : num  3.51 2.96 4.42 5.36 5.5 1.37 5.26 4.67 3.19 6.33 ...
##  $ median_percent_exempt_MMR    : num  2.48 2.23 3.3 4.57 3.57 0.76 3.91 3.55 2.26 4.38 ...
##  $ max_percent_exempt_MMR       : num  36.5 33.1 32.2 36.5 44.9 ...
##  $ min_percent_exempt_MMR       : num  0 0 0 0 0 0 0 0.62 0 0 ...
##  $ mean_percent_exempt          : num  6.77 4.42 6.96 7.27 8.15 2.13 7.55 7.44 4.1 9.69 ...
##  $ median_percent_exempt        : num  5.03 3.66 5.42 6.5 6.03 1.38 5.97 5.95 3.21 7.27 ...
##  $ max_percent_exempt           : num  43.9 33.8 42.7 44 44.3 ...
##  $ min_percent_exempt           : num  0 0 0 0 0 0 0 1.25 0 0 ...
##  $ mean_percent_nonmedexempt    : num  4.99 3.88 5.97 6.84 7.3 1.91 6.7 6.44 3.62 8.58 ...
##  $ median_percent_nonmedexempt  : num  3.6 2.97 4.66 6.14 5.35 1.14 4.88 5.13 2.95 6.55 ...
##  $ max_percent_nonmedexempt     : num  42.9 33.8 37.4 42.7 44.3 ...
##  $ min_percent_nonmedexempt     : num  0 0 0 0 0 0 0 0.42 0 0 ...
##  $ schools                      : int  581 253 205 129 165 90 86 78 57 72 ...
##  $ students                     : num  289034 132412 108104 78252 76901 ...
## Observations: 39
## Variables:
## $ school_county                 (chr) "KING", "PIERCE", "SNOHOMISH", "...
## $ mean_percent_medical_exempt   (dbl) 1.99, 0.72, 1.13, 0.59, 0.87, 0....
## $ median_percent_medical_exempt (dbl) 0.82, 0.52, 0.68, 0.50, 0.59, 0....
## $ max_percent_medical_exempt    (dbl) 29.63, 7.46, 12.26, 2.30, 14.29,...
## $ min_percent_medical_exempt    (dbl) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ mean_percent_exempt_MMR       (dbl) 3.51, 2.96, 4.42, 5.36, 5.50, 1....
## $ median_percent_exempt_MMR     (dbl) 2.48, 2.23, 3.30, 4.57, 3.57, 0....
## $ max_percent_exempt_MMR        (dbl) 36.50, 33.10, 32.18, 36.48, 44.9...
## $ min_percent_exempt_MMR        (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_exempt           (dbl) 6.77, 4.42, 6.96, 7.27, 8.15, 2....
## $ median_percent_exempt         (dbl) 5.03, 3.66, 5.42, 6.50, 6.03, 1....
## $ max_percent_exempt            (dbl) 43.87, 33.79, 42.70, 43.97, 44.3...
## $ min_percent_exempt            (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_nonmedexempt     (dbl) 4.99, 3.88, 5.97, 6.84, 7.30, 1....
## $ median_percent_nonmedexempt   (dbl) 3.60, 2.97, 4.66, 6.14, 5.35, 1....
## $ max_percent_nonmedexempt      (dbl) 42.86, 33.79, 37.44, 42.67, 44.3...
## $ min_percent_nonmedexempt      (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ schools                       (int) 581, 253, 205, 129, 165, 90, 86,...
## $ students                      (dbl) 289034, 132412, 108104, 78252, 7...
##  [1] "school_county"                 "school_type"                  
##  [3] "mean_percent_medical_exempt"   "median_percent_medical_exempt"
##  [5] "max_percent_medical_exempt"    "min_percent_medical_exempt"   
##  [7] "mean_percent_exempt_MMR"       "median_percent_exempt_MMR"    
##  [9] "max_percent_exempt_MMR"        "min_percent_exempt_MMR"       
## [11] "mean_percent_exempt"           "median_percent_exempt"        
## [13] "max_percent_exempt"            "min_percent_exempt"           
## [15] "mean_percent_nonmedexempt"     "median_percent_nonmedexempt"  
## [17] "max_percent_nonmedexempt"      "min_percent_nonmedexempt"     
## [19] "schools"                       "students"
##  school_county          school_type mean_percent_medical_exempt
##  Length:69          not_Public:30   Min.   :0.0000             
##  Class :character   Public    :39   1st Qu.:0.1900             
##  Mode  :character                   Median :0.6300             
##                                     Mean   :0.9884             
##                                     3rd Qu.:1.2400             
##                                     Max.   :7.1400             
##  median_percent_medical_exempt max_percent_medical_exempt
##  Min.   :0.0000                Min.   : 0.00             
##  1st Qu.:0.0000                1st Qu.: 1.14             
##  Median :0.1600                Median : 2.47             
##  Mean   :0.4633                Mean   : 4.76             
##  3rd Qu.:0.5800                3rd Qu.: 7.27             
##  Max.   :7.1400                Max.   :29.63             
##  min_percent_medical_exempt mean_percent_exempt_MMR
##  Min.   :0.0000             Min.   : 0.870         
##  1st Qu.:0.0000             1st Qu.: 3.100         
##  Median :0.0000             Median : 4.930         
##  Mean   :0.1571             Mean   : 7.024         
##  3rd Qu.:0.0000             3rd Qu.: 8.230         
##  Max.   :7.1400             Max.   :28.570         
##  median_percent_exempt_MMR max_percent_exempt_MMR min_percent_exempt_MMR
##  Min.   : 0.610            Min.   : 0.91          Min.   : 0.000        
##  1st Qu.: 2.400            1st Qu.: 8.82          1st Qu.: 0.000        
##  Median : 3.830            Median :17.39          Median : 0.000        
##  Mean   : 6.021            Mean   :19.72          Mean   : 2.618        
##  3rd Qu.: 6.630            3rd Qu.:30.77          3rd Qu.: 1.550        
##  Max.   :28.570            Max.   :64.29          Max.   :28.570        
##  mean_percent_exempt median_percent_exempt max_percent_exempt
##  Min.   : 1.320      Min.   : 0.980        Min.   : 2.13     
##  1st Qu.: 5.280      1st Qu.: 4.090        1st Qu.:10.53     
##  Median : 7.140      Median : 5.970        Median :23.08     
##  Mean   : 9.782      Mean   : 8.779        Mean   :24.25     
##  3rd Qu.:11.150      3rd Qu.: 9.900        3rd Qu.:37.50     
##  Max.   :35.710      Max.   :35.710        Max.   :71.43     
##  min_percent_exempt mean_percent_nonmedexempt median_percent_nonmedexempt
##  Min.   : 0.000     Min.   : 1.040            Min.   : 0.670             
##  1st Qu.: 0.000     1st Qu.: 4.450            1st Qu.: 3.510             
##  Median : 0.910     Median : 6.720            Median : 5.260             
##  Mean   : 3.905     Mean   : 8.932            Mean   : 7.956             
##  3rd Qu.: 3.230     3rd Qu.:10.530            3rd Qu.: 9.380             
##  Max.   :35.710     Max.   :31.640            Max.   :28.570             
##  max_percent_nonmedexempt min_percent_nonmedexempt    schools      
##  Min.   : 2.13            Min.   : 0.000           Min.   :  1.00  
##  1st Qu.:10.34            1st Qu.: 0.000           1st Qu.:  2.00  
##  Median :21.05            Median : 0.450           Median : 13.00  
##  Mean   :22.77            Mean   : 3.622           Mean   : 33.23  
##  3rd Qu.:36.21            3rd Qu.: 3.180           3rd Qu.: 29.00  
##  Max.   :71.43            Max.   :28.570           Max.   :449.00  
##     students     
##  Min.   :    11  
##  1st Qu.:   336  
##  Median :  2502  
##  Mean   : 15431  
##  3rd Qu.:  9944  
##  Max.   :260309
## Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  69 obs. of  20 variables:
##  $ school_county                : chr  "ADAMS" "ASOTIN" "ASOTIN" "BENTON" ...
##  $ school_type                  : Factor w/ 2 levels "not_Public","Public": 2 2 1 2 1 2 1 2 1 2 ...
##  $ mean_percent_medical_exempt  : num  0.22 2.29 0 0.36 1.56 0.33 0 0.64 5.34 0.64 ...
##  $ median_percent_medical_exempt: num  0.19 2.29 0 0.22 0 0.22 0 0.46 0.87 0.55 ...
##  $ max_percent_medical_exempt   : num  0.74 2.47 0 2.78 7.27 ...
##  $ min_percent_medical_exempt   : num  0 2.11 0 0 0 0 0 0 0 0 ...
##  $ mean_percent_exempt_MMR      : num  3.71 3.7 2.97 2.9 5.26 ...
##  $ median_percent_exempt_MMR    : num  1.31 3.7 2.97 2.23 4 ...
##  $ max_percent_exempt_MMR       : num  30.77 5.3 2.97 14.7 12.5 ...
##  $ min_percent_exempt_MMR       : num  0.22 2.11 2.97 0 0 0 1.4 0 1.55 0 ...
##  $ mean_percent_exempt          : num  5.57 5.59 4.95 3.77 6.45 ...
##  $ median_percent_exempt        : num  2.03 5.59 4.95 3.19 7.94 ...
##  $ max_percent_exempt           : num  38.46 6.36 4.95 21.09 12.5 ...
##  $ min_percent_exempt           : num  0.87 4.82 4.95 0 0 0.67 1.4 0 4.65 0.15 ...
##  $ mean_percent_nonmedexempt    : num  5.36 3.3 4.95 3.44 4.89 ...
##  $ median_percent_nonmedexempt  : num  1.79 3.3 4.95 2.88 4.8 ...
##  $ max_percent_nonmedexempt     : num  38.46 3.89 4.95 21.09 12.5 ...
##  $ min_percent_nonmedexempt     : num  0.43 2.71 4.95 0 0 0.45 1.4 0 4.65 0 ...
##  $ schools                      : int  13 2 1 50 7 29 2 21 4 113 ...
##  $ students                     : num  6040 615 101 33026 1133 ...
##  - attr(*, "vars")=List of 1
##   ..$ : symbol school_county
##  - attr(*, "indices")=List of 39
##   ..$ : int 0
##   ..$ : int  1 2
##   ..$ : int  3 4
##   ..$ : int  5 6
##   ..$ : int  7 8
##   ..$ : int  9 10
##   ..$ : int 11
##   ..$ : int  12 13
##   ..$ : int 14
##   ..$ : int  15 16
##   ..$ : int  17 18
##   ..$ : int 19
##   ..$ : int  20 21
##   ..$ : int  22 23
##   ..$ : int  24 25
##   ..$ : int  26 27
##   ..$ : int  28 29
##   ..$ : int  30 31
##   ..$ : int  32 33
##   ..$ : int  34 35
##   ..$ : int  36 37
##   ..$ : int 38
##   ..$ : int  39 40
##   ..$ : int  41 42
##   ..$ : int 43
##   ..$ : int 44
##   ..$ : int  45 46
##   ..$ : int  47 48
##   ..$ : int  49 50
##   ..$ : int 51
##   ..$ : int  52 53
##   ..$ : int  54 55
##   ..$ : int  56 57
##   ..$ : int  58 59
##   ..$ : int 60
##   ..$ : int  61 62
##   ..$ : int  63 64
##   ..$ : int  65 66
##   ..$ : int  67 68
##  - attr(*, "group_sizes")= int  1 2 2 2 2 2 1 2 1 2 ...
##  - attr(*, "biggest_group_size")= int 2
##  - attr(*, "labels")='data.frame':   39 obs. of  1 variable:
##   ..$ school_county: chr  "ADAMS" "ASOTIN" "BENTON" "CHELAN" ...
##   ..- attr(*, "vars")=List of 1
##   .. ..$ : symbol school_county
## Observations: 69
## Variables:
## $ school_county                 (chr) "ADAMS", "ASOTIN", "ASOTIN", "BE...
## $ school_type                   (fctr) Public, Public, not_Public, Pub...
## $ mean_percent_medical_exempt   (dbl) 0.22, 2.29, 0.00, 0.36, 1.56, 0....
## $ median_percent_medical_exempt (dbl) 0.19, 2.29, 0.00, 0.22, 0.00, 0....
## $ max_percent_medical_exempt    (dbl) 0.74, 2.47, 0.00, 2.78, 7.27, 1....
## $ min_percent_medical_exempt    (dbl) 0.00, 2.11, 0.00, 0.00, 0.00, 0....
## $ mean_percent_exempt_MMR       (dbl) 3.71, 3.70, 2.97, 2.90, 5.26, 2....
## $ median_percent_exempt_MMR     (dbl) 1.31, 3.70, 2.97, 2.23, 4.00, 1....
## $ max_percent_exempt_MMR        (dbl) 30.77, 5.30, 2.97, 14.70, 12.50,...
## $ min_percent_exempt_MMR        (dbl) 0.22, 2.11, 2.97, 0.00, 0.00, 0....
## $ mean_percent_exempt           (dbl) 5.57, 5.59, 4.95, 3.77, 6.45, 3....
## $ median_percent_exempt         (dbl) 2.03, 5.59, 4.95, 3.19, 7.94, 2....
## $ max_percent_exempt            (dbl) 38.46, 6.36, 4.95, 21.09, 12.50,...
## $ min_percent_exempt            (dbl) 0.87, 4.82, 4.95, 0.00, 0.00, 0....
## $ mean_percent_nonmedexempt     (dbl) 5.36, 3.30, 4.95, 3.44, 4.89, 3....
## $ median_percent_nonmedexempt   (dbl) 1.79, 3.30, 4.95, 2.88, 4.80, 2....
## $ max_percent_nonmedexempt      (dbl) 38.46, 3.89, 4.95, 21.09, 12.50,...
## $ min_percent_nonmedexempt      (dbl) 0.43, 2.71, 4.95, 0.00, 0.00, 0....
## $ schools                       (int) 13, 2, 1, 50, 7, 29, 2, 21, 4, 1...
## $ students                      (dbl) 6040, 615, 101, 33026, 1133, 129...
## Warning in rm(by_county_exempt, by_county_exempt_type,
## by_county_medexempt, : object 'by' not found

Total Exemptions vs. Enrollment

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

For most of the data (enrollment up to ~1500) there is a linear relationship between total exemptions and number of students enrolled, this isn’t surprising. It would be more intersting to look at the rate, since it is normalized for population.

Exemption rate vs enrollment

## Warning: Removed 67 rows containing missing values (stat_summary).
## Warning: Removed 67 rows containing missing values (geom_point).

Mean percent exempt has high variance at low levels of enrollment. This make sense, since a small change in exemptions represents a larger proportion of the population vs. schools with higher levels of enrollment.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

Percent exemption both general and specifc is largely flat across enrollment.

Exemption rates vs. total exemptions

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## 
##  Pearson's product-moment correlation
## 
## data:  vaccReported$total_exemptions and vaccReported$percent_exempt
## t = 29.4422, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3172190 0.3596349
## sample estimates:
##      cor 
## 0.338599
## 
##  Pearson's product-moment correlation
## 
## data:  vaccReported$total_exemptions and vaccReported$percent_medical_exempt
## t = 23.5303, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2541234 0.2983711
## sample estimates:
##       cor 
## 0.2763937
## 
##  Pearson's product-moment correlation
## 
## data:  vaccReported$total_exemptions and vaccReported$percent_nonmedexempt
## t = 24.2471, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2619730 0.3060127
## sample estimates:
##       cor 
## 0.2841427
## 
##  Pearson's product-moment correlation
## 
## data:  vaccReported$total_exemptions and vaccReported$percent_exempt_MMR
## t = 14.6517, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1529677 0.1993854
## sample estimates:
##       cor 
## 0.1762745

Weak positive correlation between total exemptions and both total and specific exemption types. Medical and non-medical exempt have ~ the same correlation to total exemptions.

Specific exemptions vs. enrollment

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## 
##  Pearson's product-moment correlation
## 
## data:  vaccReported$enrolled and vaccReported$nonmedical_exempt
## t = 52.0693, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5196347 0.5537363
## sample estimates:
##       cor 
## 0.5369048
## 
##  Pearson's product-moment correlation
## 
## data:  vaccReported$enrolled and vaccReported$nonmedical_exempt
## t = 52.0693, df = 6694, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5196347 0.5537363
## sample estimates:
##       cor 
## 0.5369048

Strong correlation between number of students enrolled and both non-medical exempt and MMR exemptions.

Specific exemptions vs. total exemptions

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

There is an interesting relationship between total exemptions and medical exemptions. The correlation is weak up to about 150 exemptions and then takes off, becoming negative at about 300 exemptions. Some of this is because there a very few schools beyond 150 exemptions, but note by the point size that this does not track strictly to school size (number of enrolled students).

There is a strong positive correlation between total exemptions and non-medical exemptions. The is also a strong positive correlation between total exemptions and MMR exemptions, but not as strong as non-medical exemptions.

Specific exemption rates vs. total exemption rate

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## 
## Call:
## lm(formula = percent_nonmedexempt ~ percent_exempt, data = vaccReported)
## 
## Coefficients:
##    (Intercept)  percent_exempt  
##        -0.2264          0.9300
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## 
## Call:
## lm(formula = percent_exempt_MMR ~ percent_exempt, data = vaccReported)
## 
## Coefficients:
##    (Intercept)  percent_exempt  
##        -0.5163          0.7083

Showing code here for sake of clarity. Calculating predicted MMR rate and non-medical exempt rate using linear fit.

# function to solve for y in y = mx + b, for a given x
lm_y <- function(lin, x) {
  m = coef(lin)[2]
  b = coef(lin)[1]
  return((m*x) + b)
  
}


lm_y(linear_MMR_rate,10)
## percent_exempt 
##       6.566816
lm_y(linear_nonmed_rate,10)
## percent_exempt 
##       9.074078
# function to solve for x in y = mx + b, for a given x
lm_x <- function(lin, y) {
  m = coef(lin)[2]
  b = coef(lin)[1]
  return((y-b)/m)
  
}

#solve lm predicted total exemption rate for a 5% MMR exemption rate
lm_x(linear_MMR_rate,5)
## (Intercept) 
##    7.787952

Medical exemption rates show a weak positive correlation to total exemption rates. Both non-medical and MMR exemptions show a strong positive correlation to total exemption rates.

Analyzing the grouped data

Grouped data per above used for the following sections.

Exemptions by school year

Except for the astounding number of outliers, there is not much to see from year to year, please move along.

Exemptions by school type

Public shcools have a higher median and larger spread of total exemptions, non-medical exemptions, and MMR exemptions. The spread is very tight for medical exemptions, with comparable medians. Some of the variability in the public schools can be explained by the fact that public schools have a larger range of enrolled students. However, this doesn’t explain observations for medical exemptions.

Exemption rates by school year

Second verse same as the first.

Exemption rates by school type

This is interesting to me. The median level is about the same for public vs. private (not-public) but the spread is much greater or private. I’ve looked at analyses from other states that essentially show that not all private schools, even by type of private school (Waldorf, Montessori) are created equal in terms of vaccination attitudes (tendency towards personal/non-medical exemptions). I don’t know if this is true of my data set, but there seems to be more variability in private vs. public schools. These plots suggest this is true of non-medical and MMR exemptions, but not medical exemptions.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

Medical and non-medical exemption rates or MMR exemption rates do not have a strong relationship. Non-medical exemption rates and MMR exemption rates have strong positive relationship.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

  • Total exemptions tracked with enrollment up to ~ 1500 students, but with high variance
  • Exemption rates both general and specific do not track with enrollment but there is a weak positive correlation between exemption rates and total exemptions (makes mathematical sense).
  • Non-medical and MMR exemption rates have a strong positive correlation with enrollment, but medical exemption rates do not.
  • A very striking relationship is that between non-medical and MMR exemption rates and total exemption rate, in sharp contrast to the relationship between medical exemption rates and total exemption rate.
  • Medical and non-medical exemption rates or MMR exemption rates do not have a strong relationship. Non-medical exemption rates and MMR exemption rates have strong positive relationship.

Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

The median level is about the same for public vs. private (not-public) but the spread is much greater for private. I’ve looked at analyses from other states that essentially show that not all private schools, even by type of private school (Waldorf, Montessori) are created equal in terms of vaccinations attitudes (tendency towards personal/non-medical exemptions). I don’t know if this is true of my data set, but there seems to be more variability in private vs. public schools. Plots suggest this is true of non-medical and MMR exemptions, but not medical exemptions.

Public shcools have a higher median and larger spread of total exemptions, non-medical exemptions, and MMR exemptions. The spread is very tight for medical exemptions, with comparable medians. Some of the variability in the public schools can be explained by the fact that public schools have a larger range of enrolled students. However, this doesn’t explain observations for medical exemptions.

What was the strongest relationship you found?

  • Non-medical exemption rates and total exemption rate and MMR exemption rates and non-medical exemption rates.

Multivariate Plots Section

Technically some of the bivariate plots could be considred multivariate since rates are derivative of more than once variable. Below I try to understand whether relationships signifcantly as a function of year or school type. I also include some other visualizatons to help foster an understanding of the data.

Rates vs. total exemptions, by year

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

  • Observing year to year variation, particularly beyond 100 total exemptions

Specific exemptions vs. total exemptions by school year and by school type

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

  • Observing year to year variation, particularly beyond 100 total exemptions
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

  • Observing year to year variation, particularly beyond 100 total exemptions
  • observing little varition in non-medical exempt by type
  • observing a divergence in rate between public and private schools for MMR exemptions

Rate vs. rate by year

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

  • observing minimal variation for percent non-medical exempt by year and school type
  • observing some variation for 2011 percent exempt MMR at ~ > 25% total exemptions by year
  • observing some variation for 2011 percent exempt MMR at ~ > 40% total exemptions by school type

Add 5 and 10% Vertical lines.

Added in vertical lines at 5 and 10% which is an estimate of the safety threshold for MMR and total exemption rates, respectively (see intro for further detail). This is for purposes of illustration.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

Repeat on subset of vaccreported with enrollment of >=100 and <= 2500.

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

  • The majority of the data falls below both thresholds
  • The school types start to diverge near the intersection of 5% MMR exemption rate and 10% total exemption rate
  • There is a subset of data that is above the MMR rate threshold but below the total exemption rate threshold
  • There is a subset of data that is below the MMR rate threshold but above the total exemption rate threshold

Bar Plots

Bar plots using the county_group and county_group_type data frames.

Mean Total and MMR exemption rates by school type

County level data for mean MMR and total exemption rates. The y axis is ordered by number of stuents enrolled for each county. I used position = dodge for side by side comparison and position = fill to show the proportion contribution of each school type to the mean rate.

It is very clear from these bar plots that there are many counties with mean total and MMR exemption rates exceeding the 10% and 5% thresholds, respectively. It is also clear that this does not cut on student population. There are several smaller counties with very high rates of exemption. However, the total exemptions are much smaller, than for larger counties. For example Ferry has only 678 students for the 2013-2014 school year, spread accross several schools.

By school type

County level data for mean MMR and total exemption rates, by school type. The y axis is ordered by number of students enrolled for each county.

For both MMR and total exemption rates, private schools are comparable to or exceed public school exemption rates for many counties. Private schools also contribute 50% or more to the MMR and total exemption rates for most counties.

By exemption type

Melting the county_group dataframe for additional bar plot visualizatons. Melt is achieved by using gather() [tidyr package], where mean_percent_medical_exempt and mean_percent_nonmedexempt are gathered into exemption type, with each individual mean captured in mean_percent_exemption_type.

## Observations: 39
## Variables:
## $ school_county                 (chr) "KING", "PIERCE", "SNOHOMISH", "...
## $ mean_percent_medical_exempt   (dbl) 1.99, 0.72, 1.13, 0.59, 0.87, 0....
## $ median_percent_medical_exempt (dbl) 0.82, 0.52, 0.68, 0.50, 0.59, 0....
## $ max_percent_medical_exempt    (dbl) 29.63, 7.46, 12.26, 2.30, 14.29,...
## $ min_percent_medical_exempt    (dbl) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ mean_percent_exempt_MMR       (dbl) 3.51, 2.96, 4.42, 5.36, 5.50, 1....
## $ median_percent_exempt_MMR     (dbl) 2.48, 2.23, 3.30, 4.57, 3.57, 0....
## $ max_percent_exempt_MMR        (dbl) 36.50, 33.10, 32.18, 36.48, 44.9...
## $ min_percent_exempt_MMR        (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_exempt           (dbl) 6.77, 4.42, 6.96, 7.27, 8.15, 2....
## $ median_percent_exempt         (dbl) 5.03, 3.66, 5.42, 6.50, 6.03, 1....
## $ max_percent_exempt            (dbl) 43.87, 33.79, 42.70, 43.97, 44.3...
## $ min_percent_exempt            (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_nonmedexempt     (dbl) 4.99, 3.88, 5.97, 6.84, 7.30, 1....
## $ median_percent_nonmedexempt   (dbl) 3.60, 2.97, 4.66, 6.14, 5.35, 1....
## $ max_percent_nonmedexempt      (dbl) 42.86, 33.79, 37.44, 42.67, 44.3...
## $ min_percent_nonmedexempt      (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ schools                       (int) 581, 253, 205, 129, 165, 90, 86,...
## $ students                      (dbl) 289034, 132412, 108104, 78252, 7...
## Observations: 78
## Variables:
## $ school_county                 (chr) "KING", "PIERCE", "SNOHOMISH", "...
## $ median_percent_medical_exempt (dbl) 0.82, 0.52, 0.68, 0.50, 0.59, 0....
## $ max_percent_medical_exempt    (dbl) 29.63, 7.46, 12.26, 2.30, 14.29,...
## $ min_percent_medical_exempt    (dbl) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ mean_percent_exempt_MMR       (dbl) 3.51, 2.96, 4.42, 5.36, 5.50, 1....
## $ median_percent_exempt_MMR     (dbl) 2.48, 2.23, 3.30, 4.57, 3.57, 0....
## $ max_percent_exempt_MMR        (dbl) 36.50, 33.10, 32.18, 36.48, 44.9...
## $ min_percent_exempt_MMR        (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_exempt           (dbl) 6.77, 4.42, 6.96, 7.27, 8.15, 2....
## $ median_percent_exempt         (dbl) 5.03, 3.66, 5.42, 6.50, 6.03, 1....
## $ max_percent_exempt            (dbl) 43.87, 33.79, 42.70, 43.97, 44.3...
## $ min_percent_exempt            (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ median_percent_nonmedexempt   (dbl) 3.60, 2.97, 4.66, 6.14, 5.35, 1....
## $ max_percent_nonmedexempt      (dbl) 42.86, 33.79, 37.44, 42.67, 44.3...
## $ min_percent_nonmedexempt      (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ schools                       (int) 581, 253, 205, 129, 165, 90, 86,...
## $ students                      (dbl) 289034, 132412, 108104, 78252, 7...
## $ exemption_type                (fctr) mean_percent_medical_exempt, me...
## $ mean_percent_exemption_type   (dbl) 1.99, 0.72, 1.13, 0.59, 0.87, 0....

Non-medical exemptions far exceed medical exemptions for all counties, contributing 70-100% of the total exemption rate, on a proportion basis.

Add a quartile field into melt_county_group, as a factor.

##  [1] 2 2 2 3 3 1 3 3 1 4 1 3 1 1 1 1 2 4 4 2 3 2 1 2 2 3 4 4 4 1 4 2 4 4 2
## [36] 4 1 3 3 2 2 2 3 3 1 3 3 1 4 1 3 1 1 1 1 2 4 4 2 3 2 1 2 2 3 4 4 4 1 4
## [71] 2 4 4 2 4 1 3 3
## Levels: 1 2 3 4
## Observations: 78
## Variables:
## $ school_county                 (chr) "KING", "PIERCE", "SNOHOMISH", "...
## $ median_percent_medical_exempt (dbl) 0.82, 0.52, 0.68, 0.50, 0.59, 0....
## $ max_percent_medical_exempt    (dbl) 29.63, 7.46, 12.26, 2.30, 14.29,...
## $ min_percent_medical_exempt    (dbl) 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ mean_percent_exempt_MMR       (dbl) 3.51, 2.96, 4.42, 5.36, 5.50, 1....
## $ median_percent_exempt_MMR     (dbl) 2.48, 2.23, 3.30, 4.57, 3.57, 0....
## $ max_percent_exempt_MMR        (dbl) 36.50, 33.10, 32.18, 36.48, 44.9...
## $ min_percent_exempt_MMR        (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ mean_percent_exempt           (dbl) 6.77, 4.42, 6.96, 7.27, 8.15, 2....
## $ median_percent_exempt         (dbl) 5.03, 3.66, 5.42, 6.50, 6.03, 1....
## $ max_percent_exempt            (dbl) 43.87, 33.79, 42.70, 43.97, 44.3...
## $ min_percent_exempt            (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ median_percent_nonmedexempt   (dbl) 3.60, 2.97, 4.66, 6.14, 5.35, 1....
## $ max_percent_nonmedexempt      (dbl) 42.86, 33.79, 37.44, 42.67, 44.3...
## $ min_percent_nonmedexempt      (dbl) 0.00, 0.00, 0.00, 0.00, 0.00, 0....
## $ schools                       (int) 581, 253, 205, 129, 165, 90, 86,...
## $ students                      (dbl) 289034, 132412, 108104, 78252, 7...
## $ exemption_type                (fctr) mean_percent_medical_exempt, me...
## $ mean_percent_exemption_type   (dbl) 1.99, 0.72, 1.13, 0.59, 0.87, 0....
## $ mean_percent_exempt_quartile  (int) 2, 2, 2, 3, 3, 1, 3, 3, 1, 4, 1,...

Below I use position = stack and add labels. My intent was to show the total exemption rate, the contribution from each type of exemption, and the number of students in the county. I then shade at 8% total exemption (corresponding to a roughly predicted 5% MMR exemption rate) to show danger zone. HOwever, it looks like stack is just “stacking”, rather than adjusting proportions. The plots also suffer from other short-comings. I’m leaving them here just to illustrate the code and hoping that coaches will have some suggestions for improvements.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

I didn’t observe any relationships that stand out fromt the bivariate plots section.

Were there any interesting or surprising interactions between features?

There were not suprising interactions. However, it was intersting to see some of the highest mean exemption rates were in smaller counties. Previously I had noticed that there was a large variance in exemption rates on the smaller end of the enrollement spectrum, which is consitent with what I’m seeing at the county level.

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.


Final Plots and Summary

Plot One

## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.
## geom_smooth: method="auto" and size of largest group is >=1000, so using gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the smoothing method.

Description One

Specific rates (% medical, nonmedical, and MMR exemption) vs. total percent exempt for schools with <= 50% total exemption, with smoothing lines per school year. Medical exemption rates do not have strong correlation with total exemption rates whereas non-medical and MMR exemption rates do. The correlation of non-medical exemption rates with the total exemption rate is stronger than that of MMR rates. There is very little change in the smoothing lines and degree of correlation from year to year.

Plot Two

Description Two

Mean percent exemption rate per county, ordered by student population (# of students enrolled), for the school year 2013 and schools with 10-2500 total students. The gradient color scale represents mean % exempt MMR for each county. Neither mean total or MMR exemption rates cut by student population. Schools with higher total exemption rates tend to have higher MMR exemption rates, consistent with plot one.

Plot Three

Description Three

Mean % exemption rate and proportion % exempt by exemption type for counties in quartile four (top 25% of mean exemption rates), along with the affected student populaation. A red shade is used to show the area at and beyond what is generally considered the threshold for herd immunity

All but two counties for quartile 4 are at or beyond the 10% herd immunity threshold. The vast majority of exemptions for each county are non-medical exemptions, also known as personal belief exemptions.

It is important to note that herd immunity thresholds are based on specific assumptions which include opportunity for exposure, how contageous a disease is, and population mixing. These may not apply evenly for all schools. Measeles in highly contagious with a large opporutnity for exposure relative to say Hep B. Esitmates for measles put a safe level of exemption closer to 5%.


Reflection

This has been an interesting project. My original intent was to use the zip code data and data from the US Census Bureau (economic and demographic data down to the tract level) and the WA state government (demographic data at the school level), to explore school vaccine exemptions and exemption rates. Cleaning and tidying the data set for this purpose was daunting, but enjoyable. As a step towards my original intent, I decided to start exploring the data to develop a better understanding of the set and how exemption rates varied by type, year, student population etc. What I found was that I had a rich data set all on its own. I was able to view the high level of variance in the data, particuarly associated with school size, as well as how types of exemptions and types of schools were contributing to exemption rates. I made some surprising findings:

and some less than surprising findings:

In general, the body of scientific evidence suggests that vaccine exemption rates below 5-10% (depending on disease attributes) puts the non-immune population (no natural immunity and low resitance and have not received vaccine or the vaccine was not effective) at risk. In WA state, the vast majority of exemptions are personal belief (non-medical) exemptions. At the county level there are 6 counties with mean exemption rates exceeding 10% with even more exceeding the 5% MMR threshold.

Now that I have a solid understanding of the underlying data I would like to do the following as next steps.

I also plan to revisit the data set containing other disease exemption data to see how each of those varies with total exemption rate and get a better idea of what diesease people are not being vaccinated for at specific geographical levels.

References